Acoustic front-end optimization for large vocabulary speech recognition
نویسندگان
چکیده
In this paper we describe experiments with the acoustic front{end of our large vocabulary speech recognition system. In particular, two aspects are studied: 1) linear transforms for feature extraction and 2) the modelling of the emission probabilities. Experiments are reported on a 5000{word task of the ARPA Wall Street Journal database. For the linear transforms our main results are: Filter{bank coe cients yield a word error rate of 9.3%. A cepstral decorrelation reduces the error rate from 9.3% to 8.0%. By applying a linear discriminant analysis (LDA) a further reduction in the error rate from 8.0% to 7.1% is obtained. Recognition results are similar for a LDA applied to lter{bank outputs and to cepstral coe cients. The experiments with density modelling gave the following results: Gaussian and Laplacian densities yield similar error rates. One single vector of variances or absolute deviations outperforms density{speci c or mixture{ speci c vectors.
منابع مشابه
Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کاملA Hybrid HMM/SVM Classifier for Wavelet Front End Robust Automatic Speech Recognition
Noisy ambient conditions pose a challenge to speech recognition, increasing the acoustic confusability, thereby looking for powerful acoustic models to improve the generalization ability of the machine learning and improve the recognition accuracy. This paper discusses a hybrid classifier that harness the power of hidden markov models (HMM) and the discriminative support vector machines (SVM) a...
متن کاملA new perceptually motivated MVDR-based acoustic front-end (PMVDR) for robust automatic speech recognition
Acoustic feature extraction from speech constitutes a fundamental component of automatic speech recognition (ASR) systems. In this paper, we propose a novel feature extraction algorithm, perceptual-MVDR (PMVDR), which computes cepstral coefficients from the speech signal. This new feature representation is shown to better model the speech spectrum compared to traditional feature extraction appr...
متن کاملEvaluation of ETSI advanced DSR front-end and bias removal method on the Japanese newspaper article sentences speech corpus
In October 2002, European Telecommunications Standards Institute (ETSI) recommended a standard Distributed Speech Recognition (DSR) advanced front-end, ETSI ES202 050 version 1.1.1 (ES202). Many studies use this front-end in noise environments on several languages on connected digit recognition tasks. However, we have not seen the reports of large vocabulary continuous speech recognition using ...
متن کاملRobust spoken language identification using large vocabulary speech recognition
A robust, task independent spoken Language Identi cation (LID) system which uses a Large Vocabulary Continuous Speech Recognition (LVCSR) module for each language to choose the most likely language spoken is described. The acoustic analysis uses mean cepstral removal on mel scale cepstral coe cients to compensate for di erent input channels. The system has been trained on 5 languages: English, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997